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I.   Introduction 

That  strategic  rivalry  in  a  long-term  relationship  may  differ  from  that 
of  a  one-shot  game  is  by  now  quite  a  familiar  idea.  Repeated  play  allows 
players  to  respond  to  each  other's  actions,  and  so  each  player  must  consider 
the  reactions  of  his  opponents  in  making  his  decision.   The  fear  of 
retaliation  may  thus  lead  to  outcomes  that  otherwise  would  not  occur.  The 
most  dramatic  expression  of  this  phenomenon  is  the  celebrated  "Folk 
Theorem."  An  outcome  that  Pareto  dominates  the  minimax  point  is  called 
individually  rational.   The  Folk  Theorem  asserts  that  any  individually 
rational  outcome  can  arise  as  a  Nash  equilbrium  in  infinitely  repeated  games 
with  sufficiently  little  discounting.  As  Aumann  and  Shapley  [1976]  and 
Rubinstein  [1979]  have  shown,  the  same  result  is  true  when  we  replace  the 
work  "Nash"  by  "(subgame)  perfect"  and  assume  no  discounting  at  all. 

Because  the  Aumann-Shapley/Rubinstein  result  supposes  literally  no 
discounting,  one  may  wonder  whether  the  exact  counterpart  of  the  Folk 
Theorem  holds  for  perfect  equilibrium,  i.e.,  whether  as  the  discount  factor 
tends  to  one,  the  set  of  perfect  equilibrium  outcomes  converges  to  the 
individually  rational  set.  After  all,  agents  in  most  games  of  economic 
interest  are  not  completely  patient;  the  no  discounting  case  is  of  interest 
as  an  approximation. 

It  turns  out  that  this  counterpart  is  false.  There  can  be  a 
discontinuity  (formally,  a  failure  of  lower  hemicontinuity)  where  the 
discount  factor,  6,  equals  one,  as  we  show  in  Example  2.  Nonetheless  the 
games  in  which  discontinuities  occur  are  quite  degenerate,  and,  in  the  end, 
we  can  give  a  qualified  "yes"  (Theorem  2)  to  the  question  of  whether  the 
Folk  Theorem  holds  with  discounting.  In  particular,  it  always  holds  in  two- 
player  games  (Theorem  1 ) .   This  last  results  contrasts  with  the  recent  work 


of  Radner-Myerson-Maskin  [1983J  showing  that,  even  in  two-player  games,  the 
equilibrium  set  may  not  be  continuous  at  6  ■=  1  in  the  discount  factor  if 
players'  moves  are  not  directly  observable  and  outcomes  depend 
stochastically  on  moves. 

Until  recently,  the  study  of  perfect  equilibrium  in  repeated  games 
concentrated  mainly  on  infinite  repetitions  without  discounting 
("supergames").   One  early  exception  was  Friedman  [l97l]  and  [l977],  who 
showed  that  any  outcome  that  Pareto  dominates  a  Nash  equilibrium  of  the 
constituent  game  (the  game  being  repeated)  can  be  supported  in  a  perfect 
equilibrium  of  the  repeated  game.^  The  repeated  game  strategies  he 
specified  are  particularly  simple:  after  any  deviation  from  the  actions  that 
sustain  the  desired  outcome,  players  revert  to  the  one-shot  Nash  equilibrium 
for  the  remainder  of  the  game.  More  recently,  Abreu  [1982]  established  that 
a  highly  restricted  set  of  strategies  suffices  to  sustain  any  perfect 
equilibrium  outcome.  Specifically,  whenever  any  player  deviates  from  the 
desired  equilibrivim  path,  that  player  can  be  "punished"  by  players' 
switching  to  the  worst  possible  equilibrium  for  the  deviator  regardless  of 
the  history  cf  the  game  to  that  point. 

We  exploit  this  idea  of  history-free  punishments,  by  contrast  with  the 
methods  of  Aumann-Shapley/Rubinstein,  in  the  proofs  of  our  Theorems  1  and  2, 
which  are  constructive. ^  In  the  proof  of  the  two-person  "discounting  folk 
theorem"  (Theorem  1),  both  players  switch  for  a  specified  niimber  of  periods 
to  strategies  that  minimize  their  opponent's  maximum  payoff  (i.e.,  minimaz 


^Actually  Friedman  was  explicitly  concerned  only  with  Nash  equilibria  of  the 
repeated  game.  The  strategies  that  he  proposed,  however,  constitute  perfect 
equilibria  (See  Theorem  C  of  section  II). 

^Lockwood  [l983j  characterizes  the  (smaller  set  of)  equilibrium  payoffs  that 
are  possible  when  one  restricts  attention  to  punishments  of  the  Aumann- 
Shapley/Rubinstein  variety. 


strategies)  after  any  deviation.  This  type  of  punishment  strategy  is  also 
the  key  to  our  proof  of  the  Folk  Theorem  for  games  with  incomplete 
information,  the  other  topic  of  this  paper. 

Although  the  theory  of  infinitely  repeated  games  offers  an  explanation 
of  cooperation  in  ongoing  relationships,  economic  agents  often  have  finite 
lives.   If  the  game  has  a  long  but  finite  length,  the  set  of  equilibria  is 
often  much  smaller  than  the  folk  theorem  would  suggest.  The  classic  example 
here  is  the  repeated  prisoner's  dilemma:  with  a  fixed  finite  horizon  the 
only  equilibrium  involves  both  players'  confessing  every  period,  in  contrast 
with  the  cooperative  equilibrium  that  is  sustainable  with  an  infinite 
horizon.  However,  anecdotal  and  experimental  evidence  both  suggest  that 
cooperation  is  a  likely  outcome  with  a  large  but  finite  number  of 
repetitions. 

Recently  Kreps-Vilson  [l982a],  Milgrom-Roberts  [l982],  and  Kreps- 
Milgrom-Roberts-Wilson  [l982j  have  proposed  a  reason  why  a  finite  number  of 
repetitions  might  allow  cooperation.  Their  explanation  supposes  that 
players  are  uncertain  about  the  payoffs  or  possible  actions  of  their 
opponents.  Such  "incomplete  information"  in  the  prisoner's  dilemma 
precludes  applying  the  backwards-induction  argument  that  establishes  that 
the  players  must  confess  each  period.  Players  can  credibly  threaten  to  take 
suboptimal  actions  if  there  is  some  (small)  probability  that  the  action  is 
indeed  optimal,  because  they  have  an  interest  in  maintaining  their 
reputation  for  possible  "irrationality." 

The  examples  of  reputation  games  analyzed  to  date  have  the  apparent 
advantage,  compared  with  infinite-horizon  models,  of  having  substantially 
smaller  sets  of  equilibria.  However,  the  equilibrium  set  depends  on  the 


precise  form  of  irrationality  specified.  Our  "incomplete  information"  Folk 
Theorem  shows  that  by  varying  the  kind  of  irrationality  specified,  but  still 
keeping  the  probability  of  irrationality  arbitrarily  small,  one  can  trace 
out  the  entire  set  of  infinite-horizon  equilibria.   Thus,  in  a  formal  sense, 
the  two  approaches,  infinite  and  finite  horizon,  yield  the  same  results. 
However,  those  who  are  willing  to  choose  among  different  forms  of 
irrationality  may  still  find  the  incomplete  information  approach  useful. 
One  may  argue  for  or  against  certain  equilibria  on  the  basis  of  the  type  of 
irrationality  needed  to  support  them. 

¥e  provide  two  different  theorems  for  repeated  games  of  incomplete 
information.   Our  first  result  parallels  Friedman's  work  on  repeated  games 
with  discounting:  after  a  deviation  the  "crazy"  player  switches  to  a  Nash- 
equilibrium  strategy  of  the  constituent  game.   This  simple  form  of 
irrationality  suffices  to  support  any  outcome  that  Pareto-dominates  a  (one- 
shot)  Nash  equilibrium.   Our  second,  and  main,  result  uses  a  more  complex  •• 
form  of  irrationality.  However,  the  basic  approach  is  the  same  as  in  our 
Folk  Theorem  with  discounting:  after  a  deviation  each  player  switches  to 
his  minimax  strategy  for  a  specified  number  of  periods. 

It  is  not  surprising  that  similar  kinds  of  arguments  should  apply  to 
both  infinite  horizon  games  with  discounting  and  finite  horizon  games.  Each 
type  of  game  entails  the  difficulty,  not  present  in  infinite  horizon  games 
without  discounting,  that  deviators  from  the  equilibrium  path  cannot  be 
"punished"  arbitrarily  severely.   This  limitation  is  a  problem  because  of 
the  requirement  of  perfection.   Iteviators  must  be  punished,  but  it  must  also 
be  in  the  interest  of  the  punishers  to  punish.   That  is,  they  must 
themselves  be  threatened  with  punishment  if  they  fail  to  punish  a  deviator. 
Such  considerations  give  rise  to  an  infinite  sequence  of  potential 


strategies)  after  any  deviation.  This  type   of  punishment  strategy  is  also 
the  key  to  our  proof  of  the  Folk  Theorem  for  games  with  incomplete 
information,  the  other  topic  of  this  paper. 

Although  the  theory  of  infinitely  repeated  games  offers  an  explanation 
of  cooperation  in  ongoing  relationships,  economic  agents  often  have  finite 
lives.   If  the  game  has  a  long  but  finite  length,  the  set  of  equilibria  is 
often  much  smaller  than  the  folk  theorem  would  suggest.   The  classic  example 
here  is  the  repeated  prisoner's  dilemma:  with  a  fixed  finite  horizon  the 
only  equilibrium  involves  both  players'  confessing  every  period,  in  contrast 
with  the  cooperative  equilibriim  that  is  sustainable  with  an  infinite 
horizon.  However,  anecdotal  and  experimental  evidence  both  suggest  that 
cooperation  is  a  likely  outcome  with  a  large  but  finite  number  of 
repetitions. 

Eecently  Kreps-Vilson  [l982a],  Kilgrom-Roberts  [l9B2],  and  Kreps- 
Milgrom-Eoberts-Vilson  [l982]  have  proposed  a  reason  why  a  finite  number  of 
repetitions  might  allow  cooperation.  Their  explanation  supposes  that 
players  are  uncertain  about  the  payoffs  or  possible  actions  of  their 
opponents.  Such  "incomplete  information"  in  the  prisoner's  dilemma 
precludes  applying  the  backwards-induction  argument  that  establishes  that 
the  players  must  confess  each  period.  Players  can  credibly  threaten  to  take 
suboptimal  actions  if  there  is  some  (small)  probability  that  the  action  is 
indeed  optimal,  because  they  have  an  interest  in  maintaining  their 
reputation  for  possible  "irrationality." 

The  examples  of  reputation  games  analyzed  to  date  have  the  apparent 
advantage,  compared  with  infinite-horizon  models,  of  having  substantially 
smaller  sets  of  equilibria.  However,  the  equilibrium  set  depends  on  the 


precise  form  of  irrationality  specified.   Our  "inconplete  information"  Folk 
Theorem  shows  that  by  varying  the  kind  of  irrationality  specified,  but  still 
keeping  the  probability  of  irrationality  arbitrarily  small,  one  can  trace 
out  the  entire  set  of  infinite-horizon  equilibria.   Thus,  in  a  formal  sense, 
the  two  approaches,  infinite  and  finite  horizon,  yield  the  same  results. 
However,  those  who  are  willing  to  choose  among  different  forms  of 
irrationality  may  still  find  the  incomplete  information  approach  useful. 
One  may  argue  for  or  against  certain  equilibria  on  the  basis  of  the  type  of 
irrationality  needed  to  support  them. 

¥e  provide  two  different  theorems  for  repeated  games  of  incomplete 
information.   Our  first  result  parallels  Friedman's  work  on  repeated  games 
with  discounting:  after  a  deviation  the  "crazy"  player  switches  to  a  Nash- 
equilibrium  strategy  of  the  constituent  game.   This  simple  form  of 
irrationality  suffices  to  support  any  outcome  that  Pareto-dominates  a  (one- 
shot)  Hash  equilibrium.   Our  second,  and  main,  result  uses  a  more  complex 
form  of  irrationality.  However,  the  basic  approach  is  the  same  as  in  our 
Folk  Theorem  with  discounting:  after  a  deviation  each  player  switches  to 
his  minimar:  strategy  for  a  specified  number  of  periods. 

It  is  not  surprising  that  similar  kinds  of  arguments  should  apply  to 
both  infinite  horizon  games  with  discounting  and  finite  horizon  games.  Each 
type  of  game  entails  the  difficulty,  not  present  in  infinite  horizon  games 
without  discounting,  that  deviators  from  the  equilibrium  path  cannot  be 
"punished"  arbitrarily  severely.   This  limitation  is  a  problem  because  of 
the  requirement  of  perfection.  Deviators  must  be  punished,  but  it  must  also 
be  in  the  interest  of  the  punishers  to  punish.   That  is,  they  must 
themselves  be  threatened  with  punishment  if  they  fail  to  punish  a  deviator. 
Such  considerations  give  rise  to  an  infinite  sequence  of  potential 


punishments  that,  at  each  level,  enforce  the  punishments  of  the  previous 
level.   Depending  on  how  these  punishments  are  arranged,  they  may  have  to 
become  increasingly  severe  the  farther  out  in  the  sequence  they  lie.  This 
creates  no  problem  in  supergames  but  may  be  impossible  for  the  two  types  of 
games  that  we  consider.   It  seems  natural,  therefore,  to  study  these  two 
types  together. 

Section  II  presents  the  classical  Polk  Theorem  and  the  Aumann- 
Shapley/Rubinstein  and  Friedman  variants.   Section  III  discusses  continuity 
of  the  equilibrium  correspondence  as  a  function  of  the  discount  factor  and 
develops  Folk  Theorems  for  infinitely  repeated  games  with  discounting. 
Section  IV  provides  a  simple  proof  that  any  payoffs  that  Pareto  dominate  a 
(one-shot)  Nash  equilibrium  can  be  sustained  in  an  equilibrium  of  a  finitely 
repeated  game  with  incomplete  information.  This  result  is  the  analog  of  the 
Friedman  [l97l]  result.   Section  V  uses  a  more  complex  approach  to  prove  a 
Folk  Theorem  for  these  finitely  repeated  games. 

Sections  II-V  assume  that  players  can  observe  each  other's  past  mixed 
strategies  (the  assumption  is  justified  in  Section  III.   In  Section  VI,  we 
conclude  by  showing  that  our  results  continue  to  hold  under  the  more 
conventional  hypothesis  that  players  can  observe  only  others'  past  actions. 
II.  The  Classical  Folk  Theorem 

Consider  a  finite  n-person  game  in  normal  form 

g:  A.  X...  xA  -►  R   . 
^   1      n 

For  now,  we  shall  not  distinguinsh  between  pure  and  mixed  strategies,  and  so 
we  might  as  well  suppose  that  the  A. 's  consist  of  mixed  strategies.  Thus, 
in  a  repeated  game,  we  are  assuming  that  a  player  can  observe  the  others' 


past  mixed  strategies  (see,  however,  Section  VI,  where  the  assumption  is 

dropped).  We  can  justify  this  assumption  by  supposing  that  the  outcomes  of 

the  randomizing  devices  are  jointly  observed  _er  post. ^  Moreover,  for 

convenience  we  assume  that  the  players  can  make  their  actions  contingent  on 

the  outcome  of  a  public  randomizing  device.  That  is,  they  can  play 

correlated  strategies.'*  To  see,  however,  that  we  can  dispose  with 

correlated  strategies,  see  the  Remark  following  Theorem  1. 

For  each  j,  choose  M  =  (M^,...,M  )  so  that 

(ni^, ..  .,M.  .  ,M  .^,  »•  ••  »M  )  e  arg  min  max  g.  (a.  ,a  .)• 
1      J-1   J+r   '  n'     ^  a_  a   ^J   J  -J 

and  define 


V,  =  max  g.  (a  ,  K^  )  =  g.(M^).5 

J 

The  strategies  (K^, . . . ,M^_. ,K^^, , .. . ,K^)  are  minimax  strategies  (which 
may  not  be  unique)  against  player  j,  and  v.  is  the  smallest  payoff  that  the 

J 


^This  is  not  quite  right  because,  even  if  outcomes  of  the  randomizing 
devices  were  observable,  one  mixed  strategy  could  not  always  be 
distinguished  from  another  if  the  two  involved  taking  the  same  action  for 
some  outcomes.  However,  this  sort  of  partial  observability  would  do  for  our 
purposes.   Indeed,  the  proofs  in  that  case  would  be  virtually  the  same  as 
with  perfect  observability. 

'*See  Aumann  [l974].  More  generally,  a  correlated  strategy  might  entail 
having  each  player  make  his  action  contingent  on  a  (private)  signal 
correlated  with  some  randomizing  device.  We  shall,  however,  ignore  this 
possibility. 

^The  notation  "a_^"  denotes  "(a^ , . . .  ,a^_^  ,a^^^ , . . .  ,a^)",  and  "gAB.^,K^.)" 

denotes  "g  (mJ,...,mJ_  ,a  ,M^^ M^). 


other  players  can  keep  player  j  below."  We  will  call  v.  player  j's 

•J 

reservation  value  and  refer  to  (v*,...,v*)  as  the  maximin  point.   Clearly, 

in  any  equilibritim  of  g  —  whether  or  not  g  is  repeated  —  player  j's 

expected  average  payoff  must  be  at  least  v.. 

J 

Henceforth  we  shall  normalize  the  payoffs  of  the  game  g  so  that 
(v^  •••»\)  =  (0,...,0). 
Let 

U  =  {(v^,...,v^)  I  (a^ ,...,a^)EA^ X. .. xA  with  g(u^,...,a^)  = 

(v^,...,v^)}, 

V  =  Convex  Hull  of  U, 

and 

V  =  {(v,,...,v^)eV  I  v^  >  0  for  all  i}. 

« 
The  set  V  consists  of  feasible  payoffs,  and  V  consists  of  feasible 

payoffs  that  Pareto  dominate  the  minimax  point.   That  is,  V  is  the  set  of 

individually  rational  payoffs.   In  a  repeated  version  of  g,  we  suppose  that 

players  maximize  the  discounted  sum  of  single  period  payoffs,  and  we  can  now 

state  a  version  of  the  Folk  Theorem. 

Theorem  A  (The  Folk  Theorem):  For  any  (v. ,...v  ) eV  there  exists  a  Mash 

equilibrium  of  the  infinitely  repeated  game  where,  for  all  i,  player  i's 

average  payoff  is  v.  if  players  discount  the  future  sufficiently  little. 


^Actually,  if  n  >^  3,  the  other  players  may  be  able  to  keep  player  ^'s  payoff 
even  lower  by  using  a  correlated  strategy  against  j,  where  the  outcome  of 
the  correlating  device  is  not  observed  by  j  (another  way  of  putting  this  is 
to  observe  that,  for  n  >  3,  the  inequality  max  min  g^(a^,a_^)  < 

gin  gax  g^(a^,a_^)  can  hold  strictly).   In  keeping  with  the  rest  of  the 

literature  on  repeated  games,  however,  we  shall  rule  out  such  correlated 
strategies. 


8 


Proof:   Let  (s.,...,s  )  e  A.x..,xa  be  a  vector  of  strategies^  such  that 
———  I      n     1      n 

g(B.,...s  )  =  (v. ,...,v  ).   Suppose  that  in  the  repeated  game  each  player  i 
plays  s.  until  some  player  j  deviates  from  s.  (if  more  than  one  player 
deviates  simultaneously,  we  can  suppose  that  the  deviations  are  ignored). 
Thereafter,  assume  that  he  plays  M. .  These  strategies  form  a  Nash 
equilibrium  of  the  repeated  game  if  there  is  not  too  much  discounting;  any 
momentary  gain  that  may  accrue  to  player  j  if  deviates  from  s  .  is  swamped  by 
the  prospect  of  being  minimaxed  forever  after. 

Q.E.D. 
Remark:   If  we  disallowed  correlated  strategies,  the  same  proof  would 

establish  that  any  positive  vector  in  U  could  be  enforced  as  an  equilibirum. 

« 
To  obtain  all  other  points  in  V  ,  players  could  vary  their  moves  from  period 

to  period  to  convezify  the  set  of  attainable  payoffs:  i.e.,  in  period  1, 

11  2      2 

players  play  (a.,... a  ),  in  period  2,  (a,,..., a  ),  etc.  By  choosing  these 

vectors  of  strategies  judiciously,  we  could  ensure  that  the  payoffs  average 
out  (arbritrarily  closely)  to  any  desired  (v. ,...,v  ). 

Of  course,  the  strategies  of  Theorem  A  do  not,  in  general,  form  a 
(subgame)  perfect  equilibrium  (such  an  equilibrium  is  a  configuration  of 
strategies  that  form  a  Nash  equilibrium  in  all  subgames;  see  Selten  [l965j)» 
because,  if  a  player  deviates,  it  may  not  be  in  others'  interest  to  go 
through  with  the  punishment  of  minimaxing  him  forever.   However,  Aumann  and 
Shapley  [1976]  and  Rubinstein  [1979]  showed  that,  when  there  is  no 


'Or,  if  necessary,  correlated  strategies. 


discounting,  the  counterpart  of  Theorem  A  holds  for  perfect  equilibrivm. 
Theorem  B  (Aumann-Shapley/Rubinstein):  For  any  (v. , . . . ,v  )e  V  there  exists 
B  perfect  equilibrium  in  the  infinitely  repeated  game  with  no  discounting, 
where,  for  all  i,  player  i's  expected  payoff  each  period  is  v. . ^ 

The  idea  of  the  proof  is  simple  to  express.   Once  again,  as  long  as 
everyone  has  previously  conformed,  players  continue  to  play  their  s. 's, 
leading  to  payoff  v..   If  some  player  j  deviates,  he  is,  as  before, 
minimaxed  but,  rather  than  forever,  only  long  enough  to  wipe  out  any 
possible  gain  that  he  obtained  from  this  deviation.  After  this  punishment, 
the  players  go  back  to  their  s. 's.   To  induce  the  punishers  to  go  through 
with  their  minimaxing,  they  are  threatened  with  the  prospect  that,  if  any 
one  of  them  deviates  from  his  punishment  strategy,  he  in  turn  will  be 
minimaxed  by  the  others  long  enough  to  make  such  a  deviation  not  worthwhile. 
Moreover,  his  punishers  will  be  punished  if  any  one  of  them  deviates,  etc.. 
Thus,  there  is  a  potential  sequence  of  successively  higher  order 
punishments,  where  the  punishment  at  each  level  is  carried  out  for  fear  the 
punishment  at  the  next  level  will  be  invoked. 

Theorem  B  is  not  an  exact  counterpart  of  Theorem  A  because  it  allows  no 
discounting  at  all  (we  investigate  in  Section  II  when  an  exact  counterpart 
holds).  Moreover,  the  strategies  of  the  proof  are  a  good  deal  more  complex 
than  those  of  Theorem  A.   One  well-known  case  that  admits  both  discounting 


^If  there  is  no  discounting,  the  sum  of  single-period  payoffs  cannot  serve 
as  a  player's  repeated  game  payoff  since  the  sum  may  not  be  defined.  Aumann 
and  Shapley  use  (the  lim  infinum  of)  the  average  payoff;  Rubinstein  uses  the 
overtaking  criterion. 
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and  simple  strategies  is  where  the  point  to  be  sustained  Pareto  dominates 
the  payoffs  of  a  Nash  equilibriim  of  the  constituent  game  g. 
Theorem  C  (Friedman  [l97l]  and  [l977]):   If  (v,,...,v  )  eV  Pareto 


dominates  the  payoffs  (y4,»-«fy  )  of  a  (one-shot)  Nash  equilibri 


um 


(e^,...,e  )  of  g,  then,  if  players  discount  the  future  sufficiently  little 

there  exists  a  perfect  equilibrium  of  the  infinitely  repeated  game  where, 

for  all  i,  player  i's  average  payoff  is  v  . 

Proof;  Suppose  that  players  play  actions  that  sustain  (v,,...v  )  until 

someone  deviates,  after  which  they  play  (e,,...,e  )  forever.  With 

I      n 

sufficiently  little  discounting,  this  behavior  constitutes  a  perfect 

equilibrium. 

Q.E.D. 


III.  The  Folk  Theorem  in  Infinitely  Repeated  Games  with  Discounting 

We  now  turn  to  the  question  of  whether  Theorem  A  holds  for  perfect 
rather  than  Nash  equilibriiim.   Technically  speaking,  we  are  investigating 
the  lower  hemi continuity^  of  the  perfect  equilibrium  averge  payoff 
correspondence  (where  the  independent  variable  is  the  discount  factor,  6)  at 
6=1.  We  first  remind  the  reader  that  this  correspondence  is  upper 
hemi continuous. ^° 


^A  correspondence  f :  X-^Y  is  lower  hemicontinuous  at  r='z  if  for  any  jt  f(T) 
and  any  sequence  x™  ->■  i  there  exists  a  sequence  y™  *  y  such  that  y™e  f(z™) 
for  all  m. 

^^If  Y  is  compact  the  correspondence  f:  X^  is  upper  hemicontinuous  at  x  if 
for  any  sequence  x™  -»•  "z  and  any  sequence  y*  -»■  y^  such  that  y™e  f(x™)  for  all 
m,  we  have  y  e  f (x^ . 
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Theorem  D;  Let  V(6)  "  {(v.,...,v  )e  V  |  (v,,...,v  )  are  the  average  payoffs 
of  a  perfect  equilibrium  of  the  infinitely  repeated  game  where  players  have 
discount  fator  6}.  The  correspondence  V(«)  is  upper  hemicontinuous  at  any 
6  <  1. 

It  is  easy  to  give  examples  where  V(.)  fails  to  be  lower  hemicontinuous 
at  6  <  1. 

Example  1 ;   Consider  the  following  version  of  the  Prisoner's  Dilemma: 

C       D 


1,1 

-1,2 

2,-1 

0,0 

For  6  <  1/2  there  are  no  equilibria  of  the  repeated  game  other  than  players* 
choosing  D  every  period.  However  at  6  =  1/2  many  additional  equilibria 
appear,  including  playing  C  each  period  until  someone  deviates  and 
thereafter  playing  D.  Thus  V(»)  is  not  lower  hemicontinuous  at  6  =  1/2. 


IIIA.  Two-Player  Games 

Our  particular  concern,  however,  is  the  issue  of  lower  hemicontinuity 
at  6  =  1 ,  and  we  begin  with  two-player  games.   It  turns  that,  in  this  case, 
the  exact  analog  of  Theorem  A  holds  for  perfect  equilibrium.  ¥e  should 
point  out,  however,  that  to  establish  this  analog  we  cannot  use  the  Aumann- 
Shapley/Rubinstein  (AS/R)  strategies  of  Theorem  B  once  there  is  discounting. 
To  see  that,  if  there  is  discounting,  such  strategies  may  not  be  able  to 
sustain  all  individually  rational  points,  consider  the  following  example. 
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Example  2 


1,1 

0,-2 

-2,0 

-1,-1 

For  this  game  the  minimax  point  is  (0,0),  and  so  a  "folk  theorem"  would 
require  that  we  be  able  to  sustain,  in  particular,  average  payoff  (e,e), 
where  0  <  e  <  1.   These  payoffs  can  be  approxiamted  by  the  players'  playing 
(C,C)  a  fraction  (e  +  1 )/2  of  the  time  and  (D,D)  the  remainder  of  the  time 
(for  6  close  to  1).   However  such  behavior  cannot  be  part  of  an  AS/E  type  of 
equilibrium.   Suppose,  for  example,  that  one  of  the  players  (say,  player  l) 
played  C  in  a  period  where  he  was  supposed  to  play  D.   In  an  AS/R 
equilibrium,  player  II  would  "punish"  I  by  playing  I)  sufficiently  long  to 
make  I's  deviation  unprofitable.   I's  immediate  gain  from  deviation  is  1, 
and  I's  best  response  to  D  is  C,  resulting  in  a  payoff  0.   Therefore  if  the 
punishment  lasts  for  t.   periods,  t.  must  satisfy 

Ml^_Si>  1  *  t  .0=  1. 

1-0  1 


That  is, 
(1) 


^1  > 


^°g^   6e     ^  -  1 
log  6 
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Condition  (1)  can  be  satisfied  as  long  as 

(2)  6>J^. 

But  in  order  to  punish  player  I,  II  must  himself  suffer  a  payoff  of  -2  for 

t  periods.   To  induce  him  to  submit  to  such  self-laceration,  he  must  be 

threatened  with  a  t.-period  punishment,  where 

t.     t>       t„-t.+1 
-2  _____  +       ___       >  1 . 

That  is, 

(3)  t^  >  -1    *  log  i_lJl2-l21_^.^/log   6. 
Such  a  t     exists  as  long  as 

6e     -3  +  26+6>0, 
which  requires   that 

(4)  6  >    (^)        '    . 

But  (4)  is  a  more  stringent  requirement  than  (2),  since 

1/t^ 
(2/2  +  e)     >  1  -  e.   Continuing  iteratively,  we  find  that,  for 

successively  higher  order  punishments,   6  is  bounded  below  by  a  sequence  of 

number  converging  to  1.   Since  6  is  itself  strictly  less  than  1,  however, 

this  is  an  impossibility,  and  so  an  AS/R  equilibrium  is  impossible. 

The  problem  is  that  in  this  example  the  punisher  is  hurt  more  severely 

by  his  punishment  than  is  his  victim.  He  must  therefore  be  threatened  with 

an  even  stronger  punishment.  ¥ithout  discounting,  this  can  be  arranged  by 

(roughly)  taking  the  t.  's  to  be  a  geometric  series,  as  in  Rubinstein  [1979]. 
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With  discounting,  however,  arbitrarily  long  punishments  are  not  arbitrarily 

severe,  because  far-off  punishments  are  relatively  unimportant. 

Nonetheless,  in  the  two-player  case  we  can  get  around  the  need  for 

arbitrarily  severe  punishments,  as  the  following  result  shows. 

Theorem  1;  For  any  ivyV^)eV     there  exists  _6e(0,l)  such  that,  for  all 

6e(6,l),  there  exists  a  subgame  perfect  equilibrium  of  the  infinitely 

repeated  game  in  which  player  i's  average  payoff  is  v^  when  players  have 

discount  factor  6. 

Proof:   Let  M^  be  player  one's  minimax  strategy  against  two,  and  M^  a 

minimax  strategy  against  one.  Take  7^  -   max  g^Ca^.a^)   •  For  (v^,V2)eV* 

choose  _v  and  b_     such  that 

(5)  V.  >  7.(1  -  6)  +  6v** 

where 

(6)  v«  =  (1  -  &^'Mgi  (Mv^^2^"  ^^^  ' 
with 

(7)  V**  >  0  . 

Condition  (5)  guarantees  that  player  i  prefers  receiving  v^ 
foreover  to  receiving  his  maximum  possible  payoff  (v^)  once,  then 
receiving  g-CM^.M-)  f or  _v  periods  and  receiving  v^  thereafter  to  receivii 
the  reservation  value,  zero,  each  period.   Clearly,  for  any  6  >  ^  there  jj .. 
a  corresponding  v(6)  such  that  (5)  and  (7)  hold  for  (6,  v(6)). 

Let  (s  ,s  )  be  correlated  one  shot  strategies  corresponding  to 

(v    V   )'      e  (s  ,s  )  =  V  .  Consider  the  following  repeated  game  strategiesk,. 
^^'!'2i12i  I 
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for  player  i: 

(a)  Play  s.   each  period  as  long  as  (s-  ,Sp)  was  played  last  period. 

After  any  deviation  from  (A), 
(B)  Play  M.  v(6)  times  and  then  start  again  with  (a).   If  there  are 

any  deviations  while  in  phase  (B),  then  "begin  phase  (b)  again. 
These  strategies  form  a  subgame-perfect  equilibrium.   Condition  (5) 
guarantees  that  deviation  is  not  profitable  in  phase  (A).   In  phase  (B) , 
player  i  receives  an  average  payoff  of  at  least  v!!*  by  not  deviating.   If 
he  deviates,  he  can  obtain  at  most  0  the  first  period  (because  his 
opponent,  j,  is  playing  af),  and  thereafter  can  average  at  most  vf*. 
Hence  deviation  is  not  profitable  in  phase  (B). 

Q.E.D. 
The  idea  behind  the  proof  of  Theorem  2  is  easily  summarized.  After  a 
deviation  by  either  player,  each  player  minimaxes  the  other  for  a  certain 
number  of  periods,  after  which  they  return  to  the  original  path.   If  a 
further  deviation  occurs  during  the  punishment  phase,  the  phase  is  begun 
again. 

Notice  that  in  the  proof  of  Theorem  2  the  only  place  where  we  invoked 
our  assumption  that  past  mixed  strategies  can  be  observed  was  in  supposing' 
that  deviations  from  the  minimaz  strategies,  M. ,  and  M_,  can  be  detected. 
This  assximption  is  dropped  in  Section  VI. 

IIIB.  Three  or  More  Players 

The  method  we  used  to  establish  Theorem  2  -  "mutual  minimaxing"  —  does 
not  extend  to  three  or  more  players.  This  is  because  with,  say,  three 
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players  there  may  exist  no  triple  of  alternatives  (M. ,M„,M_)  such  that  M^ 
and  H_  minimax  player  one,  K.  and  M_  minimax  two,  and  M.  and  M  minimax 
three;  that  is,  the  "mutual  minimax"  property  may  fail.   However,  the 
situation  is  even  worse:   not  only  does  the  method  of  proving  Theorem  2  fail 
to  extend,  but  the  result  itself  does  not  generalize.   To  see  this,  consider 
the  following  example. 


Ezamrle  5 


1.  1,  1 

0,    0,    0 

0,    0,    0 

0,    0,    0 

0,    0,    0 

0,    0,    0 

0,    0,    0 

1,  1,  1 

In  this  game,  player  one  chooses  rows,  player  two  chooses  columns,  and 
three,  matrices.  Note  that  whatever  one  player  gets,  the  others  get  too. 
Claim;  For  any  6  <  1   there  does  not  exist  a  perfect  equilibrium  of  the 
supergame  in  which  the  average  payoff  c   is  less  than  1/4  (the  one-shot  mixed 
strategy  equilibrium  payoff). 

Proof:  For  fixed  6  <  1,  let  S  =  {e  I  e  sustainable  as  an  average  payoff  in  a 
perfect  equilibrium}.   Let  a  =  inf  S.  We  must  show  that  a  =  1/4.   Let 


P(x)  =  min     max    {  g^[a^,o*,  c^  ),  g2  (c3*,  Og,  c^), 
B^{d*,ti*,a^   )  I  g^  (d*  ,d»  ,c^  )  =  X  }  . 


d*,d*,d*   OyO^fO^ 
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That  is,  p(x)  is  the  minimum  that  the  most  fortunate  defector  can  obtain  in 
a  (one-shot)  deviation  from  a  configuration  of  strategies  in  which  players' 
payoffs  are  i.  ¥e  claim  that  p(i)  2.  V4  for  all  x.  Hence,  the  mutual 
minimax  property  does  not  hold. 

To  see  this,  let  a.  he  the  probability  that  player  i  plays  the  "first" 
pure  strategy,  i.e.,  the  first  column,  row,  or  matrix  as  appropriate.  For 
some  player  i  it  must  be  the  case  that,  for  j  ^^  i  ^  k,  either  a.  >  1/2  and 

J  ^ 

a,  >  1  /2  or  a .  <  1 /2  and  a,  <  1 /2.  But  since  player  i  can  obtain  any  convex 

combination  of  a.a^  and  (l-a.)(l-a^)  as  a  payoff,  he  can  get  a  payoff  of  at 

least  1/4. 

6a 


Thus  in  any  equilibrium,  the  payoff  to  deviating  is  at  least  1/4  + 


Let  {e  }  be  a  sequence  of  possible  average  payoffs  in  perfect  equilibria, 

m 

where   e     -*  a.      For  all  m  we  have   1  /4  +  -. — r  <  % — r  .    Hence ,  —  +  - — r  <  - — r,   and 
m  1-0—1-0  41-0  —  1-0 

so  a  _>   1/4. 

Q.E.D. 
The  game  of  Example  3  is  degenerate  in  the  sense  that  V*,  the 
individually  rational  set,  is  one-dimensional.  This  degeneracy  is 
responsible  for  the  discontinuity  in  V(6)  as  the  next  result  demonstrates., 

Theorem  2;  Assxime  that  the  dimensionality  of  V*  equals  the  number  of 
players.  Then,  for  any  (v> , . . . ,v  )  in  V*,  there  exists  _6e  (0,1)  such  that 
for  all  6e(_5,  1)  there  exists  a  subgame-perfect  equilibrium  of  the  infinitely 
repeated  game  in  which  player  i's  average  payoff  is  v.  when  players  have 
discount  factor  6. 
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Proof;    Choose  s  -  (s.,...,s  )  so  that  g(s,,...,s  )  =  (v. ,...,v  )  (again 

we  allow  correlated  strategies  for  simplicity).  Also  choose  (v',...,v')  in 

1      n 

the  interior  of  V*  such  that  v.  >  v*  for  all  i.  For  each  6e(0,1)  and  each 

V.  +  1 

1-6  ^ 
i,  choose  an  integer  v.  such  that,  as  6  -*  1, — ;; — ; ■*  K,  where 

1  1-6 

(8)  -^  <  K  <  CO, 
i 

and  where,  as  before,  v.  is  player  i's  greatest  one-shot  payoff.   Let  M  = 
(K:,...,M  )  be  an  n- tuple  of  strategies  such  that  the  strategies  for  players 
other  than  j  together  minimize  player  j's  maximum  payoff,  and  such  that 
g.(M'')  =  0.   Let  w.  =  g.  (K  )  be  player  i's  per-period  payoff  when  minimaxing 
player  j.   Since  (v',...,v')  is  in  the  interior  of  V  and  V  has  full 
dimension,  there  exists  e>0  so  that,  for  each  j, 

(v* +  £,...,  ^i.i'^^'Vi.v^+^+E; v^+e) 

is  in  V  .  Let  T  =  (t^,...,T  )  be  a  joint  strategy  that  realizes  these 
payoffs.   Consider  the  following  repeated  game  strategy  for  player  i: 
(a)  Play  s.  each  period  as  long  as  s  was  played  last  period. 
If  player  j  deviates  from  (A),-'--^  then 


^•'■If  several  players  deviate  from  (A)  simultaneously,  then  we  can  just  as 
well  suppose  that  everyone  ignores  the  deviation  and  continues  to  play 
s.  . 
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(B)  Play  M.  for  v.  periods,  and  then 

(C)  Play  T^  thereafter. 

If  player  k  deviates  in  phase  (B)  or  (C),  then  begin  phase  (B)  again  with  j 
=  k.l2 

If  player  i  deviates  in  phase  (A)  and  then  conforms,  he  receives  at 
most  V.  the  period  he  deviates,  zero  for  v.  periods,  and  v!  each  period 
thereafter.   His  total  payoff,  therefore,  is  no  greater  than 

If  he  conforms  throughout,  he  obtains  v. /(l-6),  so  that  the  gain  to 
deviating  is  less  than 

which  is  negative  from  formula  (S).  Note  that  (10)  is  also  negative  for  all 
6'  >  6.   If  player  i  deviates  in  phase  (B)  when  he  is  being  punished,  he 
obtains  at  most  zero  the  period  in  which  he  deviates,  and  then  only 
lengthens  his  punishment,  postponing  the  positive  payoff  v!.   If  player  i 
deviates  in  phase  (b)  when  player  j  is  being  punished,  and  then  conforms,  he 

V.+1 

-    6  ^ 
receives  at  most  v.  +  -j- — tt  v.',  which,  from  our  analysis  of  phase  (A),  is  no 

more  than  v./]-d>.      If,  however,  he  does  not  deviate,  he  receives  at  least 


^^As  in  footnote  11,  we  can  suppose  that  simultaneous  deviation  by  several 
players  is  ignored. 
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wi'   ~  ■■   +  -rr~T   (v.'  +  e),  for  some  v  between  1  and  v..   Thus  the  gain  to 

1     1-0       1-0     1  j 

deviating  is  at  most 


V  . 
As  6-^1,  the  first  term  in  (11)  remains  finite  because  —. — 7—  converges 

V 

to  K.  But,  because  6  converges  to  1,  the  second  converges  to  negative 
infinity.   Thus  there  ezists  _6.  <  1  such  that  for  all  6  >  _6. ,  player  i  will 
not  deviate  in  phase  (B)  if  the  discount  factor  is  6. 

Finally,  the  argument  for  why  players  do  not  deviate  in  phase  (C)  is 
practically  the  same  as  that  for  phase  (A). 

Q.E.n. 

The  idea  behind  the  proof  of  Theorem  2  is  simple.   If  a  player 
deviates,  he  is  minimaxed  by  the  other  players  long  enough  to  wipe  out  any 
gain  from  his  deviation.   To  induce  the  other  players  to  go  through  with 
minimaxing  him,  they  are  ultimately  given  a  "reward"  in  the  form  of  an 
additional  "e"  in  their  average  payoff.   The  possibility  of  providing  such  a 
reward  relies  on  the  full  dimensionality  of  the  payoff  set. 

IV.   Incomplete  Information  with  Hash  Threats 

Suppose  that  a  game  is  repeated  finitely  many  times,  v,  that  players 
maximize  the  (expected)  sum  of  their  one-shot  payoffs,  and  that  players  can 
observe  all  past  one-shot  strategies  (including  mixed  strategies).  This 
repeated  game  can  be  embedded  in  an  v-period  sequential  game  of  incomplete 
information.  Suppose  that  players'  payoffs  and,  perhaps,  even  their  action 
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spaces  A.  depend  on  their  types  (although  we  shall  not  explicitly  consider 

this  latter  type  of  incomplete  information).  With  probability,  say,  1  -  e,a 

given  player  i  is  described  by  g..  We  call  a  player  of  this  type  "sane"  or 

"rational."  However  with  probability  t   his  payoffs  and  actions  spaces  may 

be  different  and  might  even  be  period-dependent.   Such  a  player  we  call 

"crazy."  The  motivation  for  suggesting  this  possibility  is  that  often  one 

cannot  be  sure  what  kind  of  player  one  is  up  against.   One  might  be  almost 

sure,  but  even  if  e  is  nearly  zero,  one  may  nevertheless  wish  to  take  into 

account  other  possibilities.   Indeed,  as  the  following  result  shows  any 

vector  of  payoffs  Pareto  dominating  a  Nash  equilibrium  of  the  constituent 

game,  g,  can  arise  as  the  (approximately)^^  average  payoffs  of  a  perfect 

equilibrium  of  a  game  of  incomplete  information^**  that,  with  high 

probability  is  just  a  finitely  repeated  version  of  g.  The  result, 

therefore,  is  the  counterpart  for  finitely  repeated  games  of  incomplete 

information  of  Friedman's  Theorem  C  above. 

Theorem  3:   Let  (e,,...,e  )  be  a  Nash  equilibrium  of  the  game,  g,  and  let 

1      n 

* 
(yif'»7  )  ~  g(e.,...,e  ).  For  any  e  >  0  and  any  (v.,...,v  )e  V  such 

that  V.  >  y.  for  all  i,  there  exists  _v  such  that  for  any  v  >  _v  there  exists 

a  v-period  sequential  game  where,  with  probability  1-e,  player  i  is 


^^The  qualification  "approximately"  is  necessary  because  the  game  is 
repeated  only  finitely  more  times. 

^'^Because  the  game  is  one  of  incomplete  information,  we  must  use  some  sort 
of  Bayesian  perfect  equilibrium  concept.  We  shall  adopt  the  sequential 
equilibrium  of  Kreps  and  Wilson  [l982b].  According  to  this  concept  a  player 
has  probabilistic  beliefs  about  other  players'  types  that  are  updated  in 
Bayesian  fashion  according  to  what  other  players  do.  An  equilibrium  is  a 
configuration  of  strategies  as  functions  of  players'  types  such  that,  at 
every  point  of  the  game,  each  player's  strategy  is  optimal  for  him,  given 
others  strategies  and  his  beliefs  about  their  types  (Actually  the  concept  is 
a  bit  more  refined  than  this,  but,  given  the  simple  structure  of  our  games, 
this  description  will  do). 
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described  in  each  period  by  g.  and  in  which  there  eiiets  a  sequential 

equilibrium  where  player  i's  average  payoff  is  within  e  of  v.. 

Remark;   Notice  that  the  theorem  asserts  the  existence  of  a  game  as  well  as 

of  an  equilibrium.   This  enables  us  to  choose  the  form  of  the  incomplete 

information. 

Proof:  As  above,  let  v.  =  max  g.    (a-,...,a  ).  Also  define  _v,  = 


1  '     n 


in      g. (a. ,...,a  ).   Choose  s  =  (s,,...,s  )  so  that  g(s.,...,s  ) 
.,...,ail      n  I      n  1      n 


mm 
^1 


We  will  consider  a  sequential  game  where  each  player  i  can  be  of  two 
types:   "sane,"  in  which  case  his  payoffs  are  described  by  g  ,  and  "crazy," 
in  which  case  he  plays  s.  each  period  as  long  as  s  has  always  been  played 
previously  and  otherwise  plays  e. .  Players  initially  attach  probability  e 
to  player  i's  being  crazy  and  probability  1-e  to  i's  being  sane.  We  shall 
see  that  early  enough  in  the  game,  both  types  of  player  i  play  s.  if  there 
have  been  no  deviations  from  s.   Hence,  a  deviation  from  s.  constitutes  an 
"impossible"  event,  one  for  which  we  cannot  apply  Bayes  rule,  and  so  we  must 
specify  players'  beliefs  about  i  in  such  an  event.  We  shall  suppose  that 
then  all  players  attach  probability  one  to  player  i's  being  sane. 

Now  starting  at  any  point  of  this  sequential  game  where  there  has 
already  been  a  deviation  from  s,  it  is  clear  that  one  sequential  equilibrium 
of  the  continuation  game  consists  of  all  players  playing  Nash  strategies 
(the  e.'s)  xintil  the  end  of  the  game.  We  shall  always  select  this 
equilibriiun.  I 


Choose  V  so  that 


I 
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n-1. 


V  -  (1-e   )v. 
(12)  V  >  mai  \—^zr, ;[ ^ 


We  will  show  that  in  a  period  with  v  periods  remaining  in  the  game , 
where  v  _>  _v,  a  sane  player  of  tjrpe  i  will  play  s .  if  there  have  been  no 
deviations  from  s  to  that  point.   If  that  period  he  plays  something  other 
than  s  ,  his  maximum  payoff  is  v..  Subsequently  his  payoff  is  y.  every 
period,  since,  starting  from  any  point  after  a  deviation  from  s,  we  always 
select  the  "Nash"  sequential  equilibrium.  Thus,  if  he  deviates  from  s.  with 
V  periods  remaining,  an  upper  boxind  to  i's  payoff  for  the  rest  of  the  game 
is 

(13)  V.  +  (v-l)y^. 

Suppose,  on  the  other  hand,  he  uses  the  sequential  strategy  of  playing 
s.  each  period  until  someone  deviates  from  s  and  thereafter  playing  e..   In 
that  case,  his  payoff  is  v.  each  period  for  the  rest  of  the  game  if  the 
other  players  are  all  crazy.   If  at  least  one  of  the  other  players  is  not 
crazy,  the  worst  that  could  happen  to  i  is  that  his  payoff  is  v^.  in  the 
first  period  and  y.  in  each  subsequent  period.   How,  assuming  that  there 
have  been  no  previous  deviations  from  s,  the  probability  that  all  the  others 
are  crazy  is  z       .  Hence,  a  lower  bound  to  i's  payoff  if  he  uses  this 
sequential  strategy  is 

(14)  e'^'^v.  +  (l-E^-'')(v.  +  (v-l)y.). 

From  (12),  (14)  is  bigger  than  (13).  Hence  all  players  i  will  play  s.  in 
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any  period  at  least  _v  periods  from  the  end.  Thus,  for  any  e  >  0,  we  can 
choose  V  big  enough  so  that  player  i*s  average  payoff  of  the  v-period 
sequential  game  is  within  e  of  v.  . 

Q.E.L. 

V.   The  Folk  Theorem  in  Finitely  Repeated  Games  of  Incomplete  Information 

In  this  section  we  strengthen  the  result  of  Section  IV  by  showing 
roughly  that  any  individually  rational  point  can  be  sustained 
(approximately)  as  the  average  equilibritun  payoffs  of  a  finitely  repeated 
game  if  the  number  of  repetitions  is  large  enough.   This  assertion  is  not 
quite  true  for  the  same  reason  that  the  perfect  equilibrium  counterpart  to 
Theorem  A  does  not  hold  for  three  or  more  players:   a  discontinuity  in  V(6) 
can  occur  if  the  payoff  set  is  degenerate.  ?or  this  reason  we  confine 
attention  to  two-player  games. ^^ 

Theorem  4;   For  any  (v2,V2)  e  V*  and  any  e  >  0  there  exists  _v  such  that  for 
any  v  >  v  there  exists  an  v-period  sequential  game  such  that,  with 
probability  1  -  e,  player  i  is  described  in  each  period  by  g.  and  in  which 
there  exists  a  sequential  equilibrium  where  player  i's  average  payoff  is 
within  E  of  V. . 

Proof;   As  in  Theorem  3»  the  proof  is  constructive.  ¥e  shall  construct 
"crazy"  player  types  designed  to  sustain  (v^jV-)  approximately  as 
equilibrium  average  payoffs. 

Let  X.  =  g.(M,,M2).   Clearly  x.  <_  0.   Because  g  is  a  finite  game,  it 
has  a  Nash  equilibrium  (e^.ej)*   Let  (y^yj)  be  the  expected  payoffs 
corresponding  to  (6^,62).   Then  y.  2.  0*  As  before,  we  suppose  for 


^^If  we  posited  full  dimension  we  could  also  establish  the  result  for  three 
or  more  players;  i.e.,  we  could  establish  the  analog  of  Theorem  3» 
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convenience  that  players  can  use  correlated  mixed  strategies.  Let  (8^,82) 
be  a  correlated  strategy  yielding  payoffs  (vj^jVj).   Let 

(15)  p  =  mai  (7^/v^), 

and,  as  before,  let  _v.  =  min   g.(a^,a2).   If  y.  >  0,  choose  an  integer  a. 

and  a  rational  number  q.  e  (l-e,l)  so  that 

(16)  7^  +  (a^+p)(1-E)y^  <  a^q^y^  '*'  1±  "   ^f 

If  y .  =0  set  q,  =  0.   Let  q  =  maz  (qpq2)«   If  7-  ^Of  take  a.  to  be  an 
integer  such  that 

(17)  y^   <  a^(l-q)  E  v^  +  8x^. 

Take  a  to  be  an  integer  greater  than  a^   and  02  such  that  qa  is  an  integer. 

To  describe  the  equilibrium  play  and  the  "crazy"  player  types  we 
partition  the  game  into  four  "phases."  We  will  number  the  periods  so  that 
the  game  ends  in  period  1 .   Phase  I  runs  from  period  v  to  period  a  +  p  +  1  ; 
Phase  II  from  (a  +  p)  to  a  +  1 ;  Phase  III  from  a  to  a(l-q)  +  1;  and  Phase  IV 
from  a  ( 1  -q )  to  1 . 

¥e  will  specify  crazy  behavior  recursively,  that  is,  in  each  period  we 
specify  how  the  crazy  player  will  behave  if  play  to  that  point  has 
corresponded  to  the  "crazy"  play  specified  for  the  previous  periods,  and  • 
also  how  the  crazy  player  will  respond  to  any  deviation  from  that  behavior. 

Let  us  begin  with  Phase  I.  We  define  the  indez  Y(t)  as  follows.   Set 
yCv)  =  v.   In  period  t,  v>_t>a+p,  the  crazy  type  plays  s .  if  YCt)  -  t  2. 
P,  and  M.  otherwise.  We  set  Y(t)  =  ¥(t+l)  if  there  was  no  deviation  from 
"crazy"  behavior  in  period  t+1 ,  and  Y(t)  =  t  otherwise.  Thus  the  crazy  type 
starts  out  by  minimazing  his  opponent  for  the  first  p  periods, ^^  and  then 


^^It  may  seem  peculiar  that  we  begin  the  game  in  a  "punishment"  phase.   This 
is  not  necessary,  but  simplifies  the  description  of  the  player  types. 
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switches  to  s.  if  there  have  been  no  deviations.  Any  deviation  restarts  the 
mutual  punishment  portion  of  the  sequence,  which  runs  for  p  periods  after 
the  deviation. 

In  Phases  II,  III  and  IV  crazy  behavior  depends  on  an  additional 
history-dependent  index  variable,  6.   9  has  four  possible  values:   G  =  0  if 
there  have  been  no  deviations  from  crazy  behavior  in  Phase  II  or  Phase  IV;  9 
=  1  if  player  one  was  the  first  to  deviate  in  Phase  II;  9  =  2  if  player  2 
was  the  first  to  deviate  in  Phase  II,  and  9  =  b  if  the  first  deviation  in 
Phase  II  was  simultaneous  deviation  by  both  players  _or  if  there  were  no 
deviations  in  Phase  II  but  there  has  been  a  deviation  in  Phase  IV.   Note 
that  9  is  unchanged  by  play  in  Phase  III. 

In  Phase  II  the  crazy  type  of  player  i  plays:   s.  if  9=0  and  Y(t)-t  2. 
P;  M.  if  9  =  0  and  ?(t)-t  <  P;  M.  if  9  =  j(#i);  and  e.  (the  one-shot  Nash 
equilibrium  strategy)  if  9  =  i  or  b.   In  Phase  III,  a  crazy  type  plays  e.  if 
9  =  0,  i,  or  b,  and  M.  if  9  =  j.   In  Phase  IV,  the  crazy  types  play  s .  if  9 
=  0,  and  e.  otherwise. 

Note  that  once  there  has  been  a  deviation  in  Phase  II  or  IV,  9  is  fixed 
for  the  remainder  of  the  game,  and  the  play  of  the  crazy  types  is  thereafter 
independent  of  subsequent  developments.   In  particular,  if  player  j  is  the 
first  to  deviate  in  Phase  II,  the  crazy  type  of  player  i  plays  the  minimax 
strategy  M.  for  the  remainder  of  the  game.  As  we  will  see.  Phases  III  and 
IV  are  sufficiently  long  that  an  epsilon  probability  of  this  unrelenting 
punishment  is  sufficient  to  prevent  deviation  from  (spS2).   However, 
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deviations  in  Phase  III  do  not  change  G,  so  that  if  9  =  0  at  the  beginning 
of  Phase  III,  the  crazy  types  will  play  Nash. for  the  duration  of  that  phase. 

Next  we  describe  the  behavior  of  the  "sane"  types  of  each  player.  For 
a  sequential  equilibrium  we  must  specify  both  a  strategy  for  each  player, 
mapping  observations  into  actions,  and  a  system  of  beliefs,  mapping 
observations  into  inferences.   In  Phase  I,  each  rational  type's  strategy  is 
the  same  as  the  corresponding  crazy  strategy.   If  his  opponent  deviates  from 
crazy  behavior,  the  rational  player's  beliefs  are  unchanged  —  he  continues 
to  assign  the  ex-ante  probabilities  of  t   and  1-e,  respectively,  to  his 
opponent  being  crazy  or  sane. 

In  Phase  II,  if,  in  state  9  •=  0,  a  player  deviates  from  crazy  behavior, 
his  opponent  attaches  probability  1  to  his  being  crazy.   The  sane  type's 
strategy  in  Phase  II  is  as  follows:  if  9  »=  0  he  plays  as  a  crazy  type.  If 
9  =  j  or  b,  he  plays  e,.  We  do  not  specify  how  he  plays  if  9  =  i. 

In  Phase  III,  the  sane  type  plays  e.  if  9=0,  j,  or  b.  The  player's 
beliefs  are  unchanged  by  deviations  from  crazy  play  in  Phase  III  if  9  =  0, 
j,  or  b.  We  do  not  specify  sane  behavior  for  Phase  IV,  except  to  require 
(a)  that  it  depend  on  past  outcomes  only  through  the  player's  beliefs  and  9 
and  (b)  that,  if  9  equals  j,  then  the  sane  type  of  player  i  plays  e. .  Thus 
we  choose  some  equilibrium  behavior  for  each  set  of  initial  Phase  IV  beliefs 
and  9.  The  exact  nature  of  this  behavior  (except  for  qualification  (b))  and 
the  behavior  in  Phases  II  and  III  if  9  =  i  is  irrelevant  to  our  analysis. 
We  know  that  there  must  exist  an  equilibrium  for  such  subgames, ^'   and  we 
will  show  that,  regardless  of  the  form  of  this  "endplay,"  there  is  a 


^'To  establish  existence  of  Phase  IV  equilibria  we  may  suppose  that  players 
choose  not  to  observe  each  other's  randomizing  device  in  Phase  IV.  Then  the 
usual  existence  theorem  (see  Kreps  and  Wilson  [  1982b])  applies. 
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Phase 


I  (periods  v  to 

a+p+1) 

II  (periods  a+p 

to  a+1) 


0 


TABLE  1 


State 


Y(t) 


2.  P"^* 


strategies 


Beliefs 


crazy     sane   (Probability  i 

attaches  to  j's 
being  crazy) 


<  p+t 

\ 

>.  p+t 

\ 

<  p+t 

\ 

s . 

1 


M. 


M, 


c 

E 


III  (periods  a  to 
a(l-q)+l) 


3 
i 
b 

0 

3 
i 
b 


M. 


IV  (periods  a(l-q)   0 
to  1) 


3 
i 
b 


M. 


M. 


e. 

1 


e. 

a 


e  . 

1 


1 
? 

1 
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sequential  equilibrium  of  the  whole  game  in  which  sane  types  play  as 
described  in  Phases  I  and  II.  Our  specified  behavior  and  updating  rules  are 
summarized  in  Table  1 . 

Now  we  must  show  that  the  specified  strategies  form  a  Nash  equilibrium 
in  each  subgame,  and  that  the  beliefs  in  each  period  are  consistent  with 
Bayes  rule.  Ve  shall  consider  whether  player  one's  specified  behavior  is 
optimal  given  his  beliefs  and  player  2's  specified  behavior. 

¥e  begin  in  Phase  IV.   If  6  equals  2,  then,  in  Phase  II,  player  two 
must  have  been  the  first  to  deviate  from  crazy  behavior.   Accordingly, 
player  one  attaches  probability  1  to  two's  being  crazy.   Since  a  crazy 
player  who  deviates  from  crazy  behavior  plays  his  Nash  equilibirum  strategy 
subsequently,  player  one  expects  two  to  play  Sj  and,  therefore,  plays  ej^ 
himself. 

Next  we  turn  to  Phase  III.   Consider  the  rational  type  of  player  one  if 
e  =  0,  j,  or  b.   He  expects  his  opponent  to  play  the  Nash  strategy  ej  for 
the  duration  of  Phase  III,  and  Phase  III  play  has  no  effect  on  Phase  IV,  so 
the  best  player  one  can  do  in  Phase  III  is  to  play  his  Nash  strategy  e^. 

How  consider  some  period  t  in  Phase  II,  i.e.,  a  +  p  >  t  >  a.  Pirst 
assume  0=0.   If  player  one  conforms  to  his  specified  strategy  in  Phase  II, 
his  payoff  in  each  period  is  either  v.  or  x. .  Thus  his  lowest  possible 
expected  payoff  for  the  remainder  of  Phase  II  is  (t-a)x>.   If  he  sticks  to 
specified  behavior  in  Phase  III  as  well,  he  receives  oqy^.   Then,  since  G 
will  equal  zero  at  the  beginning  of  Phase  IV,  player  one  can  obtain  v.  each 
period  if  player  two  is  crazy,  for  an  expected  Hiase  IV  payoff  of  at  least 
a(l-q)EV..   Thus  if  player  one  conforms  in  Phase  II  he  receives  at  least 


(18)    (t-a)x^  +  aqy^  +  a(l-q)ev^. 
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If,  however,  player  one  deviates  in  period  t  of  Phase  II,  his  highest 
possible  payoff  that  period  is  v  .   Thereafter,  player  two  plays  M  if 
crazy,  in  which  case  player  one  receives  at  most  zero,  and  player  two  plays 
e.  if  sane,  limiting  player  one's  payoff  to  y^^  per  period.   Thus  the  most 
player  one  could  expect  from  deviating  is 
(19)    v^  +  (l-e)ty^. 

Since  t  is  in  Phase  II,  t-cr  <_  p.  Then  if  y.  >  0,  the  equation  defining 
a.   and  q,. ,  (1),  ensures  that  deviation  is  unprofitable.   Similarly,  if  y.  = 
0,  equation  (17)  defining  a.  again  shows  that  (18)  exceeds  (19)« 

If  G  =  2  or  b  in  Phase  II,  player  one  is  sure  that  player  two  is  crazy. 
Thus,  one  believes  that  two  will  play  Nash  for  the  rest  of  the  game,  so 
player  one  will  play  Nash  as  specified. 

Finally  consider  a  period  t  in  Phase  I.  Prom  our  specification, 
deviations  in  Phase  I  do  not  change  the  players'  beliefs  or  the  value  of  0. 
Thus  from  our  previous  analysis,  both  players  will  conform  in  Phases  II  and 
III  regardless  of  the  play  in  Phase  I,  so  that  any  sequence  of  deviations 
must  end  at  the  start  of  Phase  II. 

First  assume  that  ¥(t)  <  p+t,  so  that  t  is  part  of  a  "punishment 
sequence."  If  player  one  conforms  in  t  and  subsequently,  his  payoff  is 

(20)  (t-?(t)  +  p)x^  +  (?  (t)-a-p)v^+  qay^  +  H^O), 

where  II  (O)  is  player  one's  payoff  in  Phase  IV  if  9  =  0  at  the  start  of  that 
phase.   If  player  one  deviates  in  period  t  and  thereafter  conforms,  his 
maximum  payoff  in  period  t  is  zero,  and  he  endures  the  "punishment"  of 
X.  for  the  next  p  periods  so  his  payoff  is  at  most 

(21)  px^  +  (t  -  p  -  a-l)v^  +  aqy^  +  n^O), 
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which  is  less  than  (20).   In  particular,  player  one  would  never  deviate  in 
the  last  period  of  Phase  I,  and,  by  backwards  induction,  will  not  wish  to 
deviate  in  period  t. 

Last  assume  ¥(t)  >^  6+t,  so  that  player  two  plays  s^  in  period  t.   If 
player  one  deviates  in  period  t  but  conforms  thereafter,  he  receives  at 
most 

(22)  v^  +  px^  +  (t  -  a  -  p  -  l)v^  +  aiiy^+  n^O). 

If  player  one  conforms   to  his  prescribed  strategy,   he  receives 

(23)  (t-a)v^    +   aqy^    +  Tl\o) . 

The  gain  to  deviating,  the  difference  between  (22)  and  (23),  is  thus 

(24)  v^  +  pz^  -  (P+I)v^. 

Since  z.  is  non-positive,  formula  (15)  defining  P  ensures  that  (24)  is 
negative,  so  player  one  will  not  deviate.  Thus  the  specified  strategies  are 
indeed  in  equilibrium.   This  equilibrium  will  yield  the  payoff  (v  ,v  )  for 
(v-a-p)  periods,  so  that  by  taking  v  sufficiently  large  we  can  make  each 
player  i's  average  payoff  arbitrarily  near  v.  . 

Q.E.D. 
Bemark:   David  Kreps  has  pointed  out  that  our  equilibrium  is  not  "stable"  in 
the  sense  of  Kohlberg-Mertens  [1982],  because  of  the  updating  rules  we  use 
in  Hiase  II.  We  can,  however,  obtain  a  stable  version  of  our  equilibrium  at 
the  cost  of  specifying  yet  more  complex  "crazy"  behavior.  Specifically, 
assume  that  at  each  period  in  Phase  II  a  crazy  player  plays  as  before  with 
probability  (l-p,),  while  assigning  strictly  positive  probability  to  every 
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other  pure  strategy.   If  \x   is  sufficiently  near  zero,  the  expected  payoffs 
in  every  subgame  are  essentially  unchanged,  and  our  strategies  are  still  in 
equilibrium.   Given  that  the  crazy  player  "trembles"  with  positive 
probability  in  Phase  II,  any  deviation  in  that  phase  must  reveal  that  the 
deviator  is  crazy,  as  we  specified. 

VI.  Unobsei^able  Kiied  Strategies 

The  arguments  in  Sections  II-V  rely  on  mixed  strategies'  being 
observable.   As  we  remarked  in  Section  III,  one  way  that  mixed  strategies 
can  be  observed  is  for  players  to  arrange  for  the  outcomes  of  their  private 
randomizing  devices  to  be  publicly  scrutinized  after  the  fact.   However, 
there  are  many  repeated  games  in  which  such  an  arrangement  is  impractical. 
In  this  section,  therefore,  we  argue  that  our  results  continue  to  hold  when 
only  the  moves  that  players  make  -  and  not  any  randomization  used  to  select 
moves  -  are  observable. 

Ve  suggested  earlier  that  the  only  significant  use  that  our  proofs  make 
of  the  assumption  that  mixed  strategies  are  observable  is  in  supposing  that 
minimax  strategies  are  observable.  The  heart  of  the  argument,  in  Theorem  6, 
therefore,  is  to  show  that  it  suffices  for  other  players  to  observe  the 
realization  of  a  punisher's  random  mixed  strategy. 

Although  we  rule  out  observation  of  private  mixed  strategies,  we 
continue  to  assume,  for  convenience,  that  strategies  can  depend  on  the 
outcome  of  publicly  observed  random  variables.  We  also  impose  the 
nondegeneracy  assumption  of  Theorem  2. 

Theorem  6:   Theorem  2  continues  to  hold  when  we  assume  that  players  can 
observe  only  the  past  actions  of  other  players  rather  than  their  mixed 
strategies. 
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Proof;  Choose  s,  (v',...,v'),  (v,,...,v  ),  (M  ,...,M  ),  and  v^    , 
i,j"1,..-,n  as  in  the  proof  of  Theorem  2.     For  each  i  and  j, consider  M., 
player  i's  minimax  strategy  against  j.   This  strategy  is,  in  general,  a 

i  ■    "^i 

randomization  among  the  m*^  pure  strategies  {a.  (k)}^^.,  where  we  have  chosen 

the  superscripts  so  that,  for  each  k=1,...,m.-1 

g.(aj  (k),  M^.)  <g.(aj  (k^D,  M^.). 
For  each  k,  let 

pj(k)  '   g.(aj  (k),  M^J  -  g.(aj  (1),  Mf^). 

The  repeated  game  strategies  we  shall  consider  closely  resemble  those 
in  the  proof  of  Theorem  2.  Player  i 

(a)  plays  s.  each  period  as  long  as  s  was  played  the  previous  period. 
If  player  j  deviates  from  (A),  then  player  i 
(B)  plays  M.  for  v.  periods. 
If  player  i  plays  pure  strategy  av(k)  in  periods  t. ,,,,t  of  phase  (B), 


define 


m   t, -1   . 
rj(k)  =  I  6^   p^(k). 
^     h=1      ^ 


Thus,  r. (k)  is  the  expected  "bonus"  that  player  i  obtains  from  playing 

a^(k)  rather  than  a9(l)  in  those  periods.   Take  r9  =  V  r9(k).   Then,  r^  is 

k 
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the  total  expected  bonus  from  phase  (B).   Let 

1       V. 

*  * 

Because  (v',...,v')  is  the  interior  of  V  and  V  has  full  dimension,  there 

exists  E  >  0  so  that,  for  each  j, 

*         1 
is  in  V  .   Since  z:  tends  to  zero  as  6  tends  to  1 ,  we  can  choose  6  big 


enough  so  that,  for  all  i  and  j,  z^  ^  "o   '     ^®" 

* 
V  .   If  player  h  deviates  from  the  prescribed  behavior  in  phase  B  the  phase 


(^i "  ^  -  ^i ^;.i "  ^  -  ^U'  ^j'^;*i "  ^  -  ^5+i"-"^; "  ^  -  ^n^ "  i^ 


is  begun  again  with  j=h.   Player  i  cannot  detect  whether  player  h  has 
deviated  from  M^,  but  he  can  observe  whether  h  has  deviated  from  the 
support  of  K^.  Accordingly,  if  h  so  deviates,  player  i  begins  phase  (B) 
again  with  j=h.   Let  T''(z)  =  (T^(z)  , . .  .,T''(z) )  be  a  vector  of  strategies 
that  realizes  these  payoffs  (note  that  T  (z)  depends  on  the  particular 
realization  of  pure  strategies  in  phase  (B)).  How  suppose  that  at  the 
conclusion  of  phase  (B),  player  i 
(C)  plays  T:(z)  thereafter, 
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and,  if  player  h  deviates  from  (C),  then  i  begins  phase  (B)  again  with  j=h. 

The  strategies  T  are  chosen  so  that  player  i  will  be  indifferent  among 
using  all  the  pure  strategies  in  the  support  of  M..   The  idea  is  that  any 
expected  advantage  that  player  i  obtains  from  using  a.(k:)  rather  than  a^i) 
in  phase  (B)  is  subsequently  removed  in  phase  (C).   Player  i  then  may  as 
well  randomize  as  prescribed  by  M. .  He  will  not  deviate  from  the  support  of 
M.  since  such  a  deviation  will  be  detected  and  punished. 

Q.E.D. 
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